Here, I’m going to make a visual representation of the number of papers published in the field of medecine. Let’s interrogate Pubmed, and look at the publication trend of machine learning scientific papers published in pubmed database. It seems that the 1st ever neural network was made in 1957 (https://www.forbes.com/sites/bernardmarr/2016/02/19/a-short-history-of-machine-learning-every-manager-should-read/#2b96042815e7)
library(rentrez)
library(ggplot2)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
search_year <- function(year, term){
query <- paste(term, "AND (", year, "[PDAT])")
entrez_search(db="pubmed", term=query, retmax=0)$count
}
year <- 1957:2014
Let’s create a vector named papers
papers <- sapply(year, search_year, term="machine learning", USE.NAMES=FALSE)
Let’s build a dataframe containing the number of papers and year of publication
d = data.frame(year=year, number_of_papers=papers);d
## year number_of_papers
## 1 1957 1
## 2 1958 0
## 3 1959 0
## 4 1960 0
## 5 1961 0
## 6 1962 1
## 7 1963 0
## 8 1964 4
## 9 1965 1
## 10 1966 2
## 11 1967 1
## 12 1968 2
## 13 1969 2
## 14 1970 2
## 15 1971 0
## 16 1972 2
## 17 1973 1
## 18 1974 2
## 19 1975 3
## 20 1976 3
## 21 1977 1
## 22 1978 2
## 23 1979 2
## 24 1980 4
## 25 1981 1
## 26 1982 4
## 27 1983 3
## 28 1984 3
## 29 1985 5
## 30 1986 6
## 31 1987 3
## 32 1988 7
## 33 1989 8
## 34 1990 7
## 35 1991 10
## 36 1992 15
## 37 1993 22
## 38 1994 24
## 39 1995 32
## 40 1996 30
## 41 1997 34
## 42 1998 31
## 43 1999 35
## 44 2000 58
## 45 2001 82
## 46 2002 98
## 47 2003 124
## 48 2004 187
## 49 2005 248
## 50 2006 324
## 51 2007 414
## 52 2008 508
## 53 2009 600
## 54 2010 716
## 55 2011 1147
## 56 2012 1496
## 57 2013 1930
## 58 2014 2422
Now, let’s make an interactive plot displaying the number of papers and year of publication.
plot_ly(x=~year, y=~number_of_papers, data=d, type="scatter", mode="marker", marker = list(color = "blue")) %>%
layout(title = "The rise of machine learning in biomedical sciences and life sciences",
annotations = list(x = 1, y = -0.1, text = "based on data from NCBI/PUBMED",
showarrow = F, xref='paper', yref='paper',xanchor='right', yanchor='auto', xshift=0, yshift=0, font=list(size=10, color="blue"),xaxis = "x", yaxis = "y")
)
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
There is a sharp increase of the number of publications from the year 2000. To better visualize that, lets build another plot
plot_ly(x=~year, y=~number_of_papers, data=d[34:58, ], type="scatter", mode="marker", marker = list(color = "blue"), linetype = I("dash")) %>%
layout(title = "The rise of machine learning in biomedical sciences and life sciences (1990-2017)",
annotations = list(x = 1, y = -0.1, text = "based on data from NCBI/PUBMED",
showarrow = F, xref='paper', yref='paper',xanchor='right', yanchor='auto', xshift=0, yshift=0, font=list(size=10, color="blue"),xaxis = "x", yaxis = "y")
)
## Adding lines to mode; otherwise linetype would have no effect.
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
When I was trying to discuss this result, I realized Hashem et al, 2017 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5760972/) found a quite similar trend, though he used a different R package (RISmed package) and plotted the proportion of papers per million. I think that the linear increase between 2005-2010 may be explained by the increased use of 3rd generation sequencing methods, which gave rise to big genomic data. Analysing such big data often requires dimension reduction and other machine learning methods. Linear regression models were found to be the most dominant machine learning techniques in the life sciences over the past three decades.
In conclusion, there are more and more health researchers who are using machine learning algorithms. I can’t wait to see the impact deep learning algorithms in life science and medicine!